



International Journal of Trend in Scientific 
Research and Development (IJTSRD) 
International Open Access Journal 



ISSN No: 2456 - 6470 | www.ijtsrd.com | Volume - 2 | Issue - 3 


♦ 

♦ 


Outlier Detection using Reverse Neares 
Neighbor for Unsupervised Data 


W. V. R. Manoj, V. Aditya Rama Narayana, 

A. Lakshmi Prasanna, A. Bhargavi, Md. Aakhila Bhanu 


'Assistant Professor, 

Dhanekula Institute of Engineering and Technology, Ganguru, Vijayawada, Andhra Pradesh, India 


ABSTRACT 


Data mining has become one of the most popular and 
new technology that it has gained a lot of attention in 
the recent times and with the increase in the 
popularity and the usage there comes a lot of 
issues/problems with the usage one of it Outlier 
detection and maintaining the datasets without the 
expected patterns. To identify the difference between 
Outlier and normal behavior we use key assumption 
techniques. We Provide the reverse nearest neighbor 
technique. There is a connection between the hubs 
and antihubs, outliers and the present unsupervised 
detection methods. With the KNN method it will be 
possible to identify and influence the outlier and 
antihub methods on real life datasets and synthetic 
datasets. So, From this we provide the insight of the 
Reverse neighbor count on unsupervised outlier 
detection. 
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detection 

INTRODUCTION: 


multi-dimensional distributions can be very difficult 
for the human brain, that is why we need to train a 
model to do it for us. 

With the decrease in the rate of events against the 
parameters that are present, the expected background 
data is very much less compared to the prediction 
with the higgs theorem. 



Outliers are huge values that differentiate from other 
observations on data; they may showcase difference 
in measurements and experimental errors. That is an 
outlier is an observation which separates from an 
overall pattern. These outliers can be divided into two 
types. Those are univariate and multivariate. 

Univariate outliers can be found in a single feature 
space having a lot of values. Multivariate outliers can 
be found in a multi-dimensional space. Identifying the 


Detecting outliers can be divided into three different 
and effective ways. Those ways are supervised, semi- 
supervised, and unsupervised; the outliers are divided 
into those categories depending on the labels for 
outliers. From the above given categories, 
unsupervised methods are the ones that are mostly 
used as the other categories require accurate and 
representative labels that are expensive to obtain. 
Unsupervised methods also include distance-based 
methods that depend on a measure of distance or 
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similarity to detect outliers. It is known that with the 
dimensionalities curse, distance becomes 
meaningless, that means pair wise distances become 
impossible to see as dimensionality increases. 
Distance on unsupervised outlier detection becomes 
good when linked to the high dimensionalities. 

Outlier Detection from Antihubs: 

Antihubs search and find the nearest neighbor and 
outlier can be detected using distance methods and for 
example KNN identifies the last nearest neighbor. 

System Architecture: 

System Architecture is a conceptual model that 
defines the structure, behavior, and more views of a 
system. An architecture description is a description 
that represents the system and is organized in a way 
such that it helps to evaluate the process that is 
present in the system and helps to understand 
reasoning about the structures and behavior of system. 
System Architecture describes that from the data 
dump we select some specific data that we require 
which is converted into the target data on which 
preprocessing is applied. Preprocessing is converting 
the raw data collected into some understandable 
format. As the data maybe incomplete or insufficient 
applying preprocessing technique may help to resolve 
the issue. After the preprocessed data some 
transformation is done, and the transformed data is 
then mined. Now applying the data mining technique 
to the transformed data, we get the patterns from the 
data acquired. Evaluating the acquired patterns, we 
can get the actual and factual data that we require. 



CONCLUSION: 

With this we like to say that we can calculate the ratio 
for the change in the normal data to the preprocessed 
data in a large dimensional dataset. When a data is 
preprocessed data is formatted it changes from the 
dump data to make it understandable. With the help of 
the above methods i.e., distance methods, KNN, and 
other different methods. 



original data lost the data but with the KNN approach 

we can see the all the formatted data that can be 

understandable to everyone and can be read. 
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